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We examine the large-network, low-loading behaviour of an attractor neural network , the so- 
called bistable gradient network (BGN). We use analytical and numerical methods to characterize 
the attractor states of the network and their basins of attraction. The energy landscape is more 
complex than that of the Hopfield network and depends on the strength of the coupling among 
units. At weak coupling, the BGN acts as a highly selective associative memory; the input must 
be close to the one of the stored patterns in order to be recognized. A category of spurious 
attractors occurs which is not present in the Hopfield network. Stronger coupling results in a 
transition to a more Hopfield-like regime with large basins of attraction. The basins of attraction 
for spurious attractors are noticeably suppressed compared to the Hopfield case, even though the 
Hebbian synaptic structure is the same and there is no stochastic noise. 



INTRODUCTION 



Many neural network models in addition to their potential applications to computation, robotics and arti- 
ficial intelligence, constitute intriguing dynamical systems in their own right, showing unusual manifestations of the 
statistical mechanics phenomena of order, disorder and frustration. The connection between neural networks and 
statistical mechanics became especially clear with the introduction of the Hopfield |^ [Q model, which furnishes a 
model of associative memory, or the recall of a memorized pattern from an incomplete stimulus. This model has a 
well-defined energy function and is closely related to the Sherrington-Kirkpatrick spin glass model j|] Q . 

In this paper we consider a Hopfield-like network of N bistable elements, the bistable gradient network or BGN, 
previously introduced in [Q. A closely related model was also discussed in and suggested as a model for the 
so-called "bistability of perception" in the interpretation of ambiguous visual stimuli The network's dynamics 
consists of a continuous gradient descent described by the coupled differential equations 

dxj _ ^^-^ 
dt dxi ' 

where Xi (1 < i < N) are continuous- valued real state variables associated with the N nodes of the network and the 
Hamiltonian or energy function is given by 

H = Hq + Hint + Hext 



^ ~ "2") ~ ^ Wya;.^;^. -^h,X,. (2) 

The quantities Wij are a symmetric matrix of coupling strengths, and the quantities hi are bias terms or external 
inputs to the network. For the remainder of this paper we will set all hi — unless otherwise stated; we include 
them here only for the sake of generality. 7 is a control parameter determining the strength of the internode couplings 
relative to the local terms. The variables Xi can be viewed as components of an iV-dimensional state vector x. We 
define a normalized inner product between two state vectors and by x^ • x^ = -i- 'Yl!i=i ^i^f- The first term in 
the Hamiltonian represents a local double-well potential for each node, making each node individually bistable. This 
local potential constitutes the main difference between the BGN and the Hopfield model. The classical Hopfield 
network (HN) which we consider by way of comparison is described by the Hamiltonian 



H 



HN 



^ N N 



where the Xi are now discrete state variables restricted to the values ±1. Although continuous versions of the HN 
have also been studied, these generally lack the bistability property, and their behaviour is essentially similar to that 
of the discrete version [flO| . 
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The variables Xi can be thought of as the outputs of proccessing units or neurons. Their dynamical equations can 
be written as 

^ = X. - xf + h,, (4) 

where hi = 7X]jLi uJijXi + bi is the input to the neuron from internal and external connections. By analogy with 
Ising spin systems, we also refer to hi as a "magnetic field." The steady-state output for a given input is a solution 
of the fixed-point equation 

Xi - Xi + hi = 0. (5) 

When h = 0, there are stable fixed points a,t x — ±1 and an unstable fixed point at a; = 0. An applied field shifts 
the positions of the fixed points. A saddle-node bifurcation occurs when \h\ = he = ~ 0.385 so that for larger 
values of the field there is only one equilibrium, aligned with the field {x and h have the same sign) . cc is in principle 
unbounded; the output does not truly saturate when the input is large. The double-valuedness and the lack of 
saturation are the principal differences between the input-output relation for the BGN and that of the Hopfield model, 
including its continuous versions. 

Numerous experimental studies have been made on intriguing chemical analogs of the BGN (see, e.g. [p2|). 
These studies involved networks of bistable chemical reactors coupled either electrically or through mass exchange. 
Previous computational work on small BGN's suggested that under some conditions the network might permit 
the storage of a larger number of patterns than in a HN of the same size, without any modification of the basic Hebb 
learning rule. It was noted, however, that the stability of a particular attractor can depend on the control parameter 
7. Some dependence of pattern stability on the coupling strength had also been noted in the experiments on the 
mass-coupled chemical networks . 

In this paper we focus on the behaviour of the network in the case where the number of nodes is large and the 
number of memorized patterns is small. Using both analytical techniques and numerical simulations [isl , we examine 
the retrieval of stored patterns and classify the attractors that occur. We find that there are three types of attractors. 
In addition to memory or retrieval states, there are spurious attractors in which no pattern is fully recognized. These 
include the mixture or spin glass states familiar from HN studies, as well as an additional category specific to the 
BGN which we refer to as uncondensed states. We examine how the attractors and their basins of attraction change 
as the control parameter 7 is changed. Throughout the paper, we compare our model to the zero-temperature or 
deterministic discrete Hopfield model. It is hoped that these results can illuminate some of the novel behavior of the 
BGN and clarify its relation to the HN. The behavior of the BGN under higher memory loading and the question 
of its maximum storage capacity will be addressed elsewhere. 



II. STORAGE AND RETRIEVAL OF BINARY PATTERNS 

As in previous work on Hopfield networks fl fl^ |p| , we define the task of associative memory as follows. We are 
given a set of p distinct iV-dimensional vectors or memory patterns ^'^ (/^ G {1, which are to be recognized 

by the network. The patterns should correspond to attractors of the network's dynamics. We will refer to these 
attractors as retrieval states. Input is given by imposing a particular initial condition on the network. If that initial 
condition is sufficiently close to one of the memorized patterns, then the network's state should converge to the correct 
nearby attractor, and we say that the network has recognized or retrieved the pattern. In this paper we follow the 
HN literature in considering the case where the patterns are random and uncorrelated strings of -f I's and — I's. We 
read the output of the network according to the signs of the Xi. Thus we say that the network has recalled pattern 1, 
for example, if sign(a:i) = for all i. Although variations in the magnitude of Xi can be important to the dynamics, 
we will for the moment ignore them for the purpose of reading the output. As we will see below, the retrieval states 
in general do not have \xi\ = 1 even though the patterns have = 1. We focus here on the limiting case N 00, 
p/N ^1, or large networks with low memory loading. (Strictly speaking, we take TV to infinity while p remains 
finite.) In this case the inner product of a pair of patterns S,'^ ■ — X^i^f^i" behaves as a Gaussian random 
variable with zero mean and variance , so that in the N 00 limit the pattern vectors are nearly orthogonal to 
each other and form a basis for a p-dimensional subspace of the A^-dimensional configuration space. 

As in the HN, we construct the coupling matrix from the stored patterns according to the Hebb [p^ learning rule: 

^1=1 
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The term — {p/N)Sij is included to make all diagonal elements of the coupling matrix zero. Non-zero diagonal entries 
would have the effect of adding an additional quadratic self-interaction term. Following the usual practice we omit 
them here so that the quadratic term is contained only in the local potential. For the case p/N 0, however, the 
effect of the diagonal elements is negligible and we can substitute the simpler learning rule 



N 

A useful set of order parameters are the overlaps m^, which are inner products of the network's state with each of 
the stored patterns: 

1 ^ 

m^^^^.x=-^Cr^, (8) 

i=0 

For the discrete HN, these variables take values — 1 < < 1, while for the BGN any real values are possible. It 
will be useful to define another set of variables, which we will call "bit overlaps," by[p^ 

4=0 

The bit overlap is simply related to the Hamming distance by — j^{N — 2d{x, ^^)) where the Hamming distance 
d(x, y) between two vectors is defined as the number of elements for which their signs differ, or the number of positions 
i such that Xiiji < 0. Unlike m^, the bit overlaps always obey —1 < 6^ < 1. They encode information about sign 
agreements but not about magnitudes of the outputs Xi. 

The definitions of the overlap variables allow us to rewrite the Hamiltonian and the dynamical equations in useful 
forms. 1^ In particular, if the synaptic matrix w is given by the simplified Hebb rule (|^), then the interaction term 
of the Hamiltonian can be rewritten in terms of as follows: 

N N p p 

Hint = - 2 E ^v^^^j = E E ^^^t^j^o = E("^A')^' (10) 

ij = l jj = l fJ.= l ^=1 



and the net input to a given node from the other nodes is given by 



dH ^ N p p 

= 7 E = ^ E E ^^^3="! = ^J2 ^^'^t^- (11) 
* j=i j"=i fi=i /j=i 



A. Retrieval states at low memory loading 



To show that the network functions properly as an associative memory, we exhibit the attractor states corresponding 
to the stored patterns, demonstrate their stability, and show that a pattern can be retrieved from an initial condition 
which lies close to the pattern but differs from it by one or more incorrect signs. 

Consider the state x — M^'^ where M is a scalar and ^'^ is a particular one of the stored patterns. We will show 
that for a suitable value of M this state represents a stable fixed point of the dynamics and is therefore the retrieval 
state we seek. In this state, rrii, = M, 6^ = 1, and all other overlaps are small. The field acting on the i-th node can 
be written as follows: 

h,=^i^M + ^J2^^'^''- (12) 
The sum over patterns fi ^ v is called the crosstalk term. For the overlaps with these other patterns we have 

"^"^^E^^^'-^'^^E^^^^f ^^'^^)■ (13) 

i i 
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Since the patterns are random, each of these overlaps is of order Mj^j N. The number of patterns remains finite as 
iV ^ oo , so the sum in the crosstalk term vanishes in this limit and hi « ^£,iM . A stationary state must satisfy 
the fixed point condition (||) for each node, which leads to a self-consistency condition on M: 

0^xl-x,^h,^ [M^'if - (1 + 7)(AfCn 

= m3-(1+7)M. (14) 

In the last step, we have used the fact that — when = ±1 and then divided out the common factor of ^i. 
Solutions to this condition arc an unstable equilibrium M — Q and two stable equilibria 



M = m^ = ±v/l + 7- (15) 

The two stable solutions represent perfect retrieval of pattern v and its mirror state, respectively. The doubled state 
is a consequence of the Z2 symmetry in the Hamiltonian. Since the overlap is equal to M and all other overlaps 
vanish in the thermodynamic limit, the energy of this retrieval state is easily calculated using expression ( |lO| ) for the 
energy in terms of the overlap variables, giving: 

/ ^4 ^2 \ 



W M'\ 7,. 

N [ -M^ 

4 2/2 



Note that this energy expression is extensive (proportional to N) and a monotonically decreasing function of 7. 
Having identified the state 



as an equilibrium state, we now demonstrate its stability using a linear stability analysis of the dynamical equations 

y^ = x^-xl+-f^ WtjXj . (18) 



- — V/,- — /:,- — 

dt 
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Evaluating the Jacobian dyi/dxj at the fixed point ([l^) we obtain: 

= (1 - ?,xj)5ij + 'fWij = (1 - 3(7 + 1)) 5ij + -ywij 

= (-2-37)(5y +7Zi;y, (19) 

where Sij is the Kronecker delta. The fixed point is linearly stable if and only if the Jacobian has no positive 
eigenvalues. This depends in turn on the eigenvalues of the synaptic connection matrix w. But in the limit where 
all of the stored patterns ^'^ are mutually orthogonal, the stored patterns are themselves eigenvectors spanning a 
degenerate subspace with eigenvalue 1, while the complement of this subspace has eigenvalue 0. (The Hebb rule (|^) 
itself gives a spectral decomposition of w.) Since the maximum eigenvalue of w is 1, we see that the Jacobian at 
the retrieval fixed point has no positive eigenvalues and so the retrieval state is linearly stable for any value of 7. In 
fact, all eigenvalues become more negative as 7 increases. We reiterate that this result is valid in the ideal limit 
of large N and low loading where the stored patterns are orthogonal. For finite-sized networks with finite overlaps 
among the patterns, it is possible for the memory states to be destabilized by the crosstalk terms. This issue will 
be examined elsewhere. 

Numerical results for a network with TV = 1000 nodes and p = 5 random patterns agree excellently with the 
above description. To study a retrieval state numerically, we initialized the network to the state x = (arbitrarily 
choosing the first pattern). Starting at 7 = 0, we increased 7 by small steps to 7 = 6. At each step, we integrated 
the dynamical equations until they converged. This procedure allows us to examine the evolution of a state under 
quasistatic changes in the control parameter 7. We verified that bi remained equal to 1 over the whole range < 7 < 6, 
indicating that the retrieval state is stable. The measured values of mi and E were within 1% of the theoretical 
expressions (|l5|) and ( p^ ) respectively. 
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B. Error correction and basins of attraction 



Linear stability analysis has shown that the retrieval states are stable against infinitesimal perturbations for any 
value of 7, but this does not guarantee their stability against the flipping of signs of one or more nodes. In order to 
function as an associative memory, a network must be capable of dynamically correcting sign errors. When presented 
with an input at a small, nonzero Hamming distance from one of the stored patterns (i.e., differing from it by a few 
reversed signs) it must be able to flip the reversed signs and restore the correct pattern. We will now show that there 
is a critical value 7c = ^ above which the correction of sign-flip errors can occur. For smaller values of 7 the BGN 
does not correct sign errors and thus does not truly function as an associative memory, but as 7 increases above ^ 
the retrieval states develop increasingly large basins of attraction. 

Consider a state of the network which is a slightly corrupted retrieval state: all node variables have the values 
Xi = M^j" = -yi + with the exception of one or possibly some number ^ of nodes which may be misaligned. 
In such a state the few misaligned nodes make only a small contribution to the overlap sums, so we have rui, « M and 
~ 0(/i 7^ v). The field acting on each node is therefore hi k, ^MS^^ . The misaligned bits experience a field opposite 
to their signs. If the field becomes larger than the critical value 2-\/3/9, then there is only one stable equilibrium for 
each node, and the misaligned nodes will flip to conform with the stored pattern. Error correction therefore occurs if 



«7M«7VTT^> ^« 0.385. (20) 

The critical value occurs when the equality 7-^1 + 7 = 2^ holds, or at 7c = ^. 

If the pattern is more strongly corrupted (a significant number of bits misaligned) then the situation is more 
complicated, because the presence of a larger number of misaligned bits may reduce the value of mi, and thus the 
magnitude of the field. The misaligned bits have a significant back-reaction on the ones with the correct sign. The 
correction of larger numbers of sign errors requires higher values of 7. We will return to this point later; the basic 
result is that when 7 is only slightly above the threshold of i, the memory states have rather small basins of attraction, 
but these basins grow as 7 increases. 



III. SPURIOUS ATTRACTORS: SPIN GLASS STATES 



In the case of the HN, the Hebb learning rule results in a large number of "spurious" attractors in addition to the 
retrieval states. The energy function defines a rugged landscape, and a trajectory which does not start sufficiently close 
to one of the stored patterns may become trapped in one of the spurious local minima instead of one corresponding 
to a recalled pattern. It is possible to suppress the spurious minima by introducing thermal noise which allows 
trajectories to jump out of the shallower basins of attraction into deeper ones. 

At low levels of loading, the HN possesses spurious attractors which are nonlinear combinations of the stored 
patterns. There is a hierarchy of symmetric mixture states of the form ||l^ [p| 

x,=sign(er ±e±-±er") (21) 

These states overlap equally with n different patterns: m^^ — m^^ = ••■"^^1 ^ ^' ^'^'^ "'^^•^ mixtures 

with odd n are stable. The n — 'i mixtures have the lowest energy in this category, and the energies increase with 
n, asymptotically approaching — I/tt. As the number of stored patterns p increases, these spurious states proliferate 
exponentially; their number is of order 3^. There are also non-symmetric mixtures. The proliferation of spurious 
states is associated with spin glas s type behaviour in the HN. Accordingly we also refer to these mixed states somewhat 
loosely as spin glass states. ||l7| 

We will show here that the BGN possesses mixture states analogous to those of the HN, but their structure is 
slightly more complex. Let us focus on the ri = 3 symmetric mixture state with positive signs. For the HN, this 
state is given by 

x,=Cf ^sign(e+er (22) 

This state is stable against individual sign flips because each node is subject to a non-zero magnetic field which 
maintains its alignment. To see this, note that there are two possibilities for each bit. Either all three patterns agree 
at that particular site (^['^ = ^['^ = or one of the patterns has the opposite sign from the other two, for example 
= ^f^ = When all three agree, we say that the i-th bit is a "unamimous" bit. If the patterns are random, 

then each is ±1 with equal probability, giving a probability of \ that a given bit is unanimous. The mixture 
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state has equal overlaps with all three of the patterns. Since for a given i there is a probability of | that = , 
we have m^^ — J2iLi ^f^t^ ~ i ^ 3 ^ ^' ^^'^ likewise m^^ = "^^3 ^ 5- "^^^ ^^^"^ acting at the i-th site is 

/^« = E"vC = ^(e+c+^r)- (23) 

(Consistent with the low- loading, large- limit, we ignore all other overlaps which are of order :^-) This gives 

hi — l^f for unanimous bits and hi = ^^f for the others. In any case, each node of the HN experiences a field which 
stabilizes its alignment. 

We will now show that the BGN, like the HN, possesses a mixture state in which the sign of Xi is given by the 
majority vote of three of the stored patterns: 



sign Xi 



sign(e+er+e) 



(24) 



This state has a more complicated structure, however, because the magnitude of Xi at a given node depends signifi- 
cantly on the local field at that node. Since the field at a unanimous bit is stronger than the field at a non-unanimous 
bit, we expect the magnitude of Xi to be larger for a unanimous bit. Therefore, we make the ansatz 



s r Asign(4''^ +er +er^') if er^ 

£'sign(Cf^ +C^') otherwise 



Si 



(25) 



where A and D are real numbers. The dynamical equations for the network give a pair of self-consistency equations 
which can be solved numerically for A and D. First, we need an expression for the overlap of the mixture state with 
one of the three patterns, say, pattern iii. If the i-th bit is a unanimous bit, then xf has magnitude A and agrees in 
sign with ^["^ On the other hand, if it is a non-unanimous bit, then xf has magnitude D and has a 2/3 probablity 
of agreeing in sign with . The result is that 



1 



1 



1 



-D 



D) 



(26) 



Note that in the special case A = D ^ 1, the above expression reduces to the Hopfield value 1/2 , as it should. Again, 

= TO^3 . The total energy of the network in this state is given by 



all three overlaps have the same size: m^^ 
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We can think of this as an an energy function on the restricted family of states parametrized by 
A necessary condition for the mixture state ( p5|) to be a fixed point is that ^ = ^ = 0. 
self-consistency conditions for the parameters A and D: 

A^ A 37 , , 
3D 37 , , 



(27) 



This gives us two 



(28) 



These are the equations of the nuUclines of the energy function (pTf). Alternatively, the above equations could be 
derived directly from the dynamical equations for each node and the expressions for hi instead of using the energy 
function. For a stable fixed point, {A, D) must be a local minimum of the energy function (|2^). The self-consistency 
equations (28) can be rewritten as follows: 



D 



4 

3^ 

A='- 

7 



A'-{l + ^)A 
- (1 + 1)D 



(29) 
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FIG. 1: Self-consistency equations for the n = 3 mixture state. (A) Stable solutions occur in the first and third quadrants. 
(B) When 7 < 2 two additional stable solutions appear in the second and fourth quadrants. 



The graphs of these two cubic equations are plotted for 7 = 3 in figure |l|A. Solutions are points where the two 
nullclines intersect. Note that the slope, dD/dA , of the first curve at the origin is — ^1 + -^^^ < —1, and for 
the second curve ^ = — tti: > — 1 . These two inequalities satisfied by the slopes ensure that the curves always 

intersect in at least 5 points. The five solutions can be classified by looking at the energy function and its gradient. 
We see that the solution A ~ D = is an unstable node (maximum of the energy), the two solutions in the 2nd and 
4th quadrants are saddle points, and the two solutions in the first and third quadrants are the stable solutions we 
seek. (There are two because of the Z2 sign reversal symmetry — one is a mirror state of the other.) 

The self-consistency equations for A and D were solved numerically for a range of 7 values using a gradient descent 
algorithm. These values were in turn substituted into (|2^) to find the energy as a function of 7. The results are 
plotted in figure |^ where they are also compared with numerical results from dynamical simulations of a BGN with 
N — 1000. We studied the mixture state numerically by initializing a network to the state Xi = sign(^j^ -I- S^f + £^f) 
and incrementing 7 beginning at 0, much as was done for the retrieval states. Figure shows the magnitudes of 
unanimous and non-unanimous bits in the mixture state as functions of 7. The solid lines show the solutions of the self 
consistency equations for A and D. The symbols show the observed magnitudes \xi\ ioi I < i < Q in the simulated 
network state. Two of these first six bits are unanimous while the other four are not. There is good agreement 
between the observed values of jxijand the values obtained from the self-consistency equations. For comparison, 
^/T+^ is plotted as a dotted line on the same axes. (Recall that this is the value of all \xi\ in a pure retrieval state.) 
Figure ^ shows a corresponding comparison of the observed and theoretical energies. Finally, figure compares 
the n = 3 mixture state with the retrieval state by plotting the ratios A/y/1 + 7 and D j \J\ -I- 7 as well as the ratio of 
the mixture state energy Emix to that of a retrieval state E^et ■ AH three of these ratios appear to approach constant 
asymptotic values as 7 increases. Asymptotically, Emix / E^et ~ 0.7, while for the HN the corresponding ratio is 
0.75. The strength of the field acting on each unanimous bit, /i^, and that acting on the non-unanimous bits, 
hi), both increase as 7 increases. The mixture state is stable against single sign flips of the unanimous bits when 
hA > and stable against any single sign flip when ho > Thus as 7 increases, the mixture state begins to 

develop a non-trivial basin of attraction of its own. 
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(A) (B) 




1 2 3 4 5 6 7 

7 



FIG. 2: n=3 mixture state compared with retrieval state. (A) The two solid lines represent A and D, the values of \x\ for 
unanimous and non-unanimous bits respectively, obtained by numerical solution of the self-consistency equations. The symbols 
show the observed values l^ij (1 < i < 6) for a mixture state of the dynamically simulated network. The dotted line is y/1 + 7, 
which is the theoretical value of all \xi\ for a retrieval state. (B) Energy Emix for the mixture state. Solid line=solution 
of self-consistency equations; symbols=observed energy of simulated mixture state; dotted line=energy Emt of retrieval state 
from eq. (p^. (C) the ratios A/^/l + 7 and D/\/l + 7 (upper and lower solid lines) and Emix/ Eret (dotted line) approach 
asymptotically constant values. 



It is interesting to note that at 7 = 2 a saddle-node bifurcation occurs and for 7 < 2 two aditional stable solutions 
to the self-consistency equations (|28| ) appear in the second and fourth quadrants, at [A, D) = (1,-1) and (—1,1). 
(See figure [^B.) These are states with E = ~jN in which m^j = = m^., = and the net field acting on each 
node exactly cancels. They are not stable against sign flips, and become completely destabilized when 7 > 2. As 
we will see in the following section, they do not properly belong to the category of mixture or spin glass states, but 
rather to another class of spurious attractors present only in the BGN at low values of 7. 

Here we examined only the n = 3 mixture state, but similar methods may be used to characterize higher-order 
mixtures. In general, they are more complex as there are more possibilities for the size of the majority by which the 
sign is determined. The magnitudes of Xi then take a greater number of distinct values. 



IV. UNCONDENSED STATES AND THEIR COLLAPSE TO THE PATTERN SUBSPACE 



In section [IB, we noted that for values of 7 not far above g, it is possible that a state may have a significant 
overlap with one of the stored patterns but that the field acting on the nodes may nonetheless not be strong enough 
to overcome the potential barrier and correct the sign errors. Indeed if 7 is below i, then even a single sign error 
may go uncorrected. This consequence of the bistability of the BGN units contrasts with the behavior of the HN. 

Consider first the case of the HN. A typical random initial state has small but nonzero overlaps with the memorized 
patterns, ^ 0{1/ \/N), resulting in fields hi — J2^i''^t^ which are random with zero mean and variance of order 
1/^/N. Typically, for approximately half of the nodes Xi and hi will initially have opposite signs. Since there is 
no potential barrier against sign fiips, those nodes will change their signs, and the sign flips will continue until the 
field experienced by every node is aligned with Xi. Every sign flip will increase the magnitude of one or more of the 
overlap variables. If, for example, one overlap m,y is larger than all of the others, then most nodes will experience 
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fields which tend to aHgn them with pattern v. Every sign flip further increases the value of m^, and eventually 
will be fully retrieved even though the initial overlap may have been quite small. However, if one overlap does not 
clearly dominate the others, then the trajectory may arrive at a spurious attractor which has roughly equal overlaps 
with several patterns, instead of at a single one of the memory states. Even a state which is initially orthogonal to 
all of the memorized patterns can be rendered unstable by changing a single sign: even a single sign flip will create a 
small but nonzero fleld affecting the other nodes, resulting in further sign flips, and so on. In summary, for the HN 
essentially any initial condition converges under the dynamics to an attractor lying in or close to the p-dimensional 
subspace of the patterns. 

For the BGN, on the other hand, the situation is different due to the presence of potential barriers. Just as with 
the HN, given any state x every node i experiences a field ft.^, which may be aligned with or opposed to Xi. However, 
the antiparallel local fields may not be strong enough to flip their nodes into the parallel direction. If most hi are 
well below the threshold then the flipping of one or a few nodes will not change the field enough to cause any 
further flips. Thus there might be a large number of initial conditions which remain stuck with low overlaps, far 
away from any of the patterns. We refer to such states with sub-threshold fields as "uncondensed" states, because 
in those states none of the order paramenters are condensed. However, we will show below that states with 
low overlaps cannot remain stable for 7 > 2, and thus for higher values of 7 the behavior is Hopfield-like, with all 
trajectories collapsing toward the pattern subspace. 

Consider a hypothetical state which is strictly orthogonal to all memory patterns, so that — for all ^. (The 
extra solutions appearing in the self-consistency equations for the mixture state when 7 < 2 are examples of such 
states.) If — for all /i, then hi = for all i. In this case, the steady state of each node is Xi = ±1. Proceeding 
with linear stability analysis as above, we find the relevant Jacobian 



dxj 



(1 - 3xf)Sij + jw^j = -2(5,y + 7Wy. (30) 



The equilibrium is unstable if ^Wmax > 2 , where Wmax is the largest eigenvalue of the coupling matrix, w-a 
least 1. Therefore, if 7 > 2, states orthogonal to the stored patterns are all unstable. If all of the p stored patterns 
are mutually orthogonal (which is approximately true in the limit N ^ 00) then for 7 > 2 there are p unstable 
eigenvalues. 

We can examine one of those unstable directions, say, the one associated with the I'-th pattern, more closely 
by making an explicit ansatz. Let us denote the orthogonal, zero-overlap state by Xi = by assumption = 
= fo'" A* including ^ = v. For half of the nodes ^i*- = ; for the other half = — ^f. Let us then 
consider a family of states described by the ansatz 

This parametrizes a 2-dimensional subspace of the state space which contains and one of its unstable eigenvectors. 



Using the methods of section III, one obtains a pair of cubic self-consistency equations for A and B: 



B^-—+(- + l\A, A^~—+(- + l]B. (32) 
7 V7 / 7 VT 



For all 7 there are two solutions [A^B) — (±-^1 4- 7, =F\/1 + 7)1 which correspond to the retrieval states x — 
±-\/l + 7^'^- -f'oi' 7 < 2 there are additional stable solutions (A, B) = ±(1, 1) corresponding to ± A bifurcation 
occurs at 7 = 2 and these solutions become saddle points. 

Thus, we see that there is an absolute upper limit for the existence of stable uncondensed states. In fact, 
7 = 2 turns out to be a high upper bound. The example of a state with all equal to zero is a sort of "worst- 
case scenario." For a finite-sized network the typical random initial condition has small but nonzero overlaps. In 
addition, if the patterns are truly random then they will not be exactly orthogonal but have small overlaps and so 
the largest eigenvalue of the synaptic matrix will be slightly larger than unity. Because of these factors the typical 
uncondensed state becomes unstable at values of 7 lower than 2; in numerical simulations we found that for the case 
N = 1000, P = 5 most become unstable between 7 = 1 and 1.5. 

Figures ^-g, we show numerical results for the fate of a typical random initial condition of the BGN with N = 1000, 
p — h. Figures show the dynamical evolution of the same initial condition, at different values of 7. The initial 
condition was a random string of ±1 values. We plot the energy per node, all five overlap variables and all five 
bit overlap variables as functions of time. Recall that contain information about sign agreements only. For 
7 — 0.5 (figure |^) the state changes very little before convergence occurs. The energy per node remains very close to 
—0.25. The overlaps increase slightly in magnitude p8{, but the bit overlaps do not change at all, indicating that 
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FIG. 3: Trajectory of a random initial condition for 7 — 0.5. Note: The unequal time steps result from the adaptive step size 
control in our integration algorithm. 
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FIG. 4: Trajectory of the same initial condition, 7 = 1.0. 



no sign flips occur. For 7=1 (figure the trajectory is similar, except that the small initial overlaps are amplified 
to a greater extent (we will explain this effect below.) The bit overlaps still do not change. When 7 = 1.2, however, 
the trajectory changes qualitatively (figure |^). The magnitudes of the overlaps grow slowly until at t ^ 7 the 
resulting field becomes strong enough to begin flipping some signs. At this point the bit overlaps begin to change, 
the energy drops signiflcantly and the trajectory moves close to the pattern subspace. After some further evolution, 
the system converges to a mixture state which overlaps with several patterns. A different random initial condition, 
followed again at 7 = 1.2, leads instead to a memory state (figure ||). Here one of the five overlaps becomes dominant 
and the others shrink away. In this case the mirror state of one of the five patterns is retrieved. These trajectories 
are typical examples representing descent on a rugged energy landscape. Different initial conditions lead to different 
attractors, of which some are memory states and some are mixtures. Frequently the trajectory lingers at one or 
several states before settling at its asymptotic attractor. 

Figures illustrated that for sufficiently small values of 7, the dynamics amplifies small initial overlaps without 
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FIG. 6: An 7 = 1.2 trajectory with a different random initial condition. In this case, one of the bit overlaps reaches —1, while 
the others become small. This indicates that the mirror state of one of the stored patterns has been retrieved. 
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flipping the sign of any node. For more insight into this phenomenon, consider an initial state in which all Xi are ±1 
but somewhat more nodes are aligned parallel with one particular pattern than are antiparallel. In other words, 
bi, is nonzero but less than unity. For simplicity let us neglect all other overlaps. Initially, each node experiences a 
small field given by hi — "imi,£^'^ — . This field will push Xi to larger magnitudes (> 1) for those nodes which are 

aligned with pattern ^j", and it will push the others to smaller magnitudes Xi < 1. This adjustment in turn increases 
the value of m^, until an equilibrium is reached with rui, > b^. We might think of this as a kind of "subliminal" 
recognition of the pattern. The effect becomes stronger as 7 increases. Clearly it has a nonlinear dependence on 
both 7 and bi,. When the field becomes large enough it will exceed the threshold for sign flips and the state will 
be attracted toward the pattern retrieval state. The larger 7 is, the smaller the initial 6^ that is necessary to fully 
retrieve the pattern ^'j' . In other words, the basins of attraction of the patterns expand as 7 increases. 

V. BASINS OF ATTRACTION AND THE ENERGY LANDSCAPE 

In this section, we provide numerical support for the three-way classification of attractors into retrieval, spin glass 
and uncondensed states and we show how the respective attractor basins change with the control parameter 7. We 
observe an interpolation between two different regimes. As we showed above, for 7 > 2 there are no stable uncondensed 
states. For lower values of 7, on the other hand, uncondensed states are numerous. Recall that uncondensed states 
are characterized by local fields too weak to overcome the potential barriers against sign flips, and so their dynamics 
is dominated by the local potential. In the extreme case 7 = 0, there are of course no magnetic fields at all and only 
the local potential is present. 

A. Statistics of attractors reached from random initial conditions 

The classification of attractors is very clearly refiected in the energy spectrum. To explore attractors and their 
basins, we "seeded" the network with 500 random initial conditions (taken with several different realizations of the 
five random patterns), integrated the dynamical equations until they converged, and constructed a histogram of the 
final energies (figure |^). For the case 7 = 1 (figure 0A), there are three clearly separated clusters of attractors. 
Those with the lowest energies are retrieval states, while the states clustered at E/N = —0.25 are the uncondensed 
states, and those in the intermediate range are the glassy states. The picture is qualitatively similar at the slightly 
larger value 7 = 1.25 (figure 0B), but the peak at E/N = —0.25 has shrunk relative to the other two. Note that 
the energies of the retrieval and spin glass states change with 7, while the uncondensed states remain at nearly the 
same energy because their dynamics is dominated by the local potential. For 7 = 2, (figure ^p) on the other hand, 
the cluster at E/N = —0.25 is absent as there are no stable uncondensed states. The histogram for a HN (figure 
^) resembles that for the BGN with 7 = 2. One quantitative difference is that the retrieval state peak is slightly 
higher for the BGN with 7 = 2, while the glassy states are comparatively suppressed. 

We performed this experiment at a range of values of 7. In all cases the classification of states was clear from the 
energy spectrum and was verified by examining the final values of 6^. Figure ^ shows the probabilities of convergence 
to each of the three types of attractors from a random initial condition as functions of 7. At 7 = 0.5 the landscape is 
dominated by the uncondensed states. Even though 7 = 0.5 lies above the threshold of | and the retrieval states have 
non-trivial basins of attraction, these basins still occupy a very small fraction of the total configuration space volume. 
The patterns can be retrieved only if the initial overlaps are relatively high, and the probability of a random initial 
condition being sufficiently close is very low. The retrieval probability becomes significant only as 7 approaches 1. 
As 7 increases from 1 to 1.5, basins for the memory and spin glass states grow at the expense of of the uncondensed 
states until the latter disappear. The retrieval state basins grow faster than those of the spin glass states. Beyond 
7 = 1.5, the probability of retrieving a memory state saturates at approximately a 10% higher value than in the 
Hopfield case, and the probability of falling into a spin glass state is correspondingly lower. 



B. Mapping the boundaries of basins of attraction 

In an attempt to map the attractor basins in more detail, we generated configurations at specified initial Hamming 
distances from particular memory patterns. This was done by starting with a pattern ^'^ and flipping the signs 
of a specified number of randomly chosen bits. Using an ensemble of such initial conditions, we measured the 
probability of retrieval of the target pattern as a function of the initial distance from it. As a rule, the probability 
of recognizing the pattern is high if only a few signs are flipped, but drops sharply if a certain threshold Hamming 
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FIG. 7: Energy per node for attractors reached from random initial conditions, showing a clear separation among different 
types. (A) For 7 = 1, three types of attractors are clearly present. Uncondensed states show up as a peak near E = — 0.25A'^. 
(B) For 7 — 1.25, the uncondensed state peak is smaller but occurs at nearly the same energy, whereas the other two peaks are 
at different energies. (C) For 7 = 2, only the retrieval and spin glass states are obtained. (C) HN behaviour is qualitatively 
similar to the BGN with 7 = 2. 
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FIG. 8: Probability of convergence of a random initial condition to each of the three types of attractors, plotted as functions 
of 7. Hopfield values are shown for comparison. 
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FIG. 9: Attractors retrieved from states with a specified initial overlap with a target pattern. Plots show probability of 
retrieving the the target pattern (solid circles), one of the other memory patterns (squares), an uncondensed state (triangles) 
or a spin glass spurious state (stars). 



distance is exceeded. We are interested in learning where this threshold lies, and thus answering the question of how 
close an initial condition must be to a pattern in order to be attracted to it. We are also interested in the fate of 
states lying just outside the boundaries of a basin of attraction. In other words, does the basin share a boundary 
with the basins of other patterns, or only with spurious attractors? The results are presented in figure ^ for a BGN 
with N — 1000 and p — 5, for the three values 7 — 0.5, 1.0, and 2.0, and also for the HN. In each of these cases, we 
generated an ensemble of initial conditions at a particular initial value of bi, for some pattern. Each initial condition 
was allowed to evolve under the dynamics and the resulting attractor was classified as: A) the target pattern , B) 
one of the other patterns {fj, v), C) A spin glass spurious state or D) an uncondensed state. In this case, we 
classified a state as uncondensed if no sign flips occured during the dynamical evolution. The probabilities of each 
of these four outcomes were averaged over several realizations of the random patterns and plotted as functions of the 
initial bit overlap bi,. 

Consider first the HN data from figure |9|D. A pattern can evidently be retrieved even if the initial overlap is fairly 
small; the probability is close to unity if binu ^0.1. If the target pattern is not retrieved, then either a spurious 
attractor or one of the other patterns may be retrieved. There is a range of binit over which all three probabilities are 
significant, indicating that the basins for the memory states border on each other as well as those of spurious states. 
For an A'^ = 1000 network, the expected magnitude of the overlap of a random state with any given one of the stored 
patterns is 1/Vn w .03, which is not much smaller than the apparent threshold of bmit ~ 0.1. This is consistent 
with the view that for the HN, a pattern is likely to be retrieved as long as the initial overlap with that pattern is 
significantly larger than all of the other overlaps. The 7 = 2 BGN (figure ^C) shares the qualitative features of the 
HN. Note however that the probability of becoming trapped in a spurious state is smaller for the BGN, consistent 
with the results in figure ^. 

A contrasting case is the 7 ~ 0.5 BGN (figure ||A). In this case retrieval of the target pattern requires an 
initial overlap of more than 0.5. Although this represents a significant basin, it is highly unlikely that a random 
initial condition will have such a large overlap, thus explaining why random initial conditions almost never flow to 
a memory state. The basins of the memory states are bordered only by spurious states, not by other memory 
states. Interestingly, the states which lie adjacent to the basin of a memory pattern are not all uncondensed- some 
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FIG. 10; Schematic illustration showing qualitative features of the energy landscape for low values of 7. The numerous small 
depressions represent uncondensed states. The retrieval states and mixture states occupy isolated valleys. (Retrieval states 
are represented by the deeper valleys.) 



are spurious states of the mixture or spin glass type. Examination of the states retrieved near the boundary shows 
that these are typically asymmetric mixture states with one large overlap and two or more smaller (but greater than 
random) overlaps. Finally, for 7 = 1 (figure |^), the basins of the memory state are almost as large as in the HN 
case, and near the boundaries there is a small but nonzero probability of retrieving one of the other memory patterns, 
indicating that the basins of different memory states almost touch each other. 

C. Qualitative picture of the energy landscape 

Taken together, the above results suggest a qualitative, schematic picture of the energy landscape illustrated in 
figures p^|l^ . The representation of the configuration space by two dimensions is not to be taken literally, since 
it is of course A^-dimensional. At low values of 7, such as 0.5, the energy landscape is dominated by uncondensed 
states, which form a series of shallow basins, each limited to roughly a single orthant of A^-dimensional space. These 
are represented in the diagram by a series of shallow pits. The basins of attraction for the retrieval and mixture 
states form isolated depressions in this pitted plateau. They occupy nontrivial volumes but do not lie adjacent to 
each other (with the exception of certain spin glass states which lie near the retrieval states) . At intermediate values 
7^1, the basins of attraction of the retrieval states are much larger and in some places almost touch each other, 
but significant islands of uncondensed states remain. By 7 = 2, however, the uncondensed states have disappeared 
and the basins of attraction for the other two types of states occupy the entire energy landscape and share boundaries 
with each other. 



VI. CONCLUSIONS 

We have studied the behaviour of the bistable gradient network in the thermodynamic low-loading limits N 00, 
p/N <^ 1. We described and classified the attractors of the dynamics and also observed the effectiveness of pattern 
retrieval as a function of the coupling parameter 7. We found that states corresponding to perfect retrieval of the 
stored patterns are linearly stable at all values of 7, and have an energy that decreases monotonically with 7. Above 
the threshold 7 = | the retrieval states become stable against sign flips of one or more nodes, and the network begins 
to function as an associative memory. If 7 is not far above this threshold, then the basins of attraction of the retrieval 
states are small and input must be very close in Hamming distance to a pattern for recognition to occur. The basins 
of attraction of the retrieval states grow as 7 increases. 
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FIG. 11: Energy landscape for intermediate values of 7 ~ 1. The retrieval states and mixture states have large basins of 
attraction which almost touch, but significant islands of uncondensed states remain. 




FIG. 12: Energy landscape for large 7. There are no uncondensed states, and large basins of attraction occupy the whole 
landscape. 



There are two regimes of behaviour, distinguished by the types of attractors that occur. At low 7 the configuration 
space is dominated by the uncondensed states, or states in which no node experiences a field strong enough to overcome 
its potential barrier. In these states, Ix^l remains close to 1 for all nodes, and the energy remains close to —0.25. 
Each of these states occupies a basin of attraction confined to approximately a single orthant. In the limit 7 = 
there arc 2^ such states, all degenerate in energy. As 7 increases above the threshold jc = the retrieval states and 
the mixture or spin glass states at first occupy small isolated basins among the many uncondensed states. However, 
as 7 increases further, these basins grow until they lie adjacent to each other. At some value of 7 (observed to 
lie between 1 and 1.5), the uncondensed states disappear and there is a transition to a Hopfield-like regime where 
the basins of attraction for retrieval and spin glass states cover the whole configuration space. As 7 increases 
still further, the retrieval basins grow at the expense of the spin glass states, so that the latter can be noticeably 
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suppressed compared to the deterministic Hopfield case. This suppression of the spurious states occurs without 
thermal noise or a modification of the Hebb learning rule. 

The uncondensed states represent a phase which is neither "ferromagnetic" (i,e, strongly ordered and correlated 
with one pattern) nor "glassy" in the sense that frustration is an important effect, yet they cannot properly be 
described as "paramagnetic," as paramagnetism is characterized by spins which are able to flip freely from one 
orientation to the other. 

A few words on the application of such networks to practical problems of associative memory are in order. The 
goal of associative memory is to reconstruct a pattern from a more or less corrupted version or from a fragment 
of the pattern, without becoming trapped in a spurious local minimum. From this point of view, it appears that 
increasing 7 improves the performance of the network — expanding the basins of attraction for the retrieval states 
and suppressing the spurious states. The low-7 regime, on the other hand, may be suited to applications where the 
goal is a selective associative memory, one which only recognizes a pattern from a fairly close approximation and 
thus avoids false recognition. In the low-7 regime, if the input is not close to one of the stored patterns, then the 
network is likely to remain in an uncondensed state. These can in general be distinguished clearly from other states 
(especially retrieval states) by their relatively high energy {E/N k, —0.25) or by the fact that the magnitudes of all 
\xi\ remain close to 1. The magnitudes of the outputs can therefore be read as a signal of whether recognition has 
occurred. Persistence in an uncondensed state corresponds to an "I don't know" or nonrecognition response. 

In a subsequent publication, we will examine the behavior of the BGN when the loading level p/N is of order unity, 
and we will demonstrate another performance trade-off. Specifically, we will show that the maximum storage capacity 
of the network decreases as 7 increases. For a low-7 regime, it is possible to stabilize more memorized patterns than 
in the Hopfield case, while at higher 7, even though the low-loading fault tolerance is increased, the storage capacity 
decreases. 
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